119 research outputs found
Recommended from our members
Automatic Labeling of Special Diagnostic Mammography Views from Images and DICOM Headers.
Applying state-of-the-art machine learning techniques to medical images requires a thorough selection and normalization of input data. One of such steps in digital mammography screening for breast cancer is the labeling and removal of special diagnostic views, in which diagnostic tools or magnification are applied to assist in assessment of suspicious initial findings. As a common task in medical informatics is prediction of disease and its stage, these special diagnostic views, which are only enriched among the cohort of diseased cases, will bias machine learning disease predictions. In order to automate this process, here, we develop a machine learning pipeline that utilizes both DICOM headers and images to predict such views in an automatic manner, allowing for their removal and the generation of unbiased datasets. We achieve AUC of 99.72% in predicting special mammogram views when combining both types of models. Finally, we apply these models to clean up a dataset of about 772,000 images with expected sensitivity of 99.0%. The pipeline presented in this paper can be applied to other datasets to obtain high-quality image sets suitable to train algorithms for disease detection
PathologyBERT -- Pre-trained Vs. A New Transformer Language Model for Pathology Domain
Pathology text mining is a challenging task given the reporting variability
and constant new findings in cancer sub-type definitions. However, successful
text mining of a large pathology database can play a critical role to advance
'big data' cancer research like similarity-based treatment selection, case
identification, prognostication, surveillance, clinical trial screening, risk
stratification, and many others. While there is a growing interest in
developing language models for more specific clinical domains, no
pathology-specific language space exist to support the rapid data-mining
development in pathology space. In literature, a few approaches fine-tuned
general transformer models on specialized corpora while maintaining the
original tokenizer, but in fields requiring specialized terminology, these
models often fail to perform adequately. We propose PathologyBERT - a
pre-trained masked language model which was trained on 347,173 histopathology
specimen reports and publicly released in the Huggingface repository. Our
comprehensive experiments demonstrate that pre-training of transformer model on
pathology corpora yields performance improvements on Natural Language
Understanding (NLU) and Breast Cancer Diagnose Classification when compared to
nonspecific language models.Comment: submitted to "American Medical Informatics Association (AMIA)" 2022
Annual Symposiu
Was there COVID-19 back in 2012? Challenge for AI in Diagnosis with Similar Indications
Purpose: Since the recent COVID-19 outbreak, there has been an avalanche of research papers applying deep learning based image processing to chest radiographs for detection of the disease. To test the performance of the two top models for CXR COVID-19 diagnosis on external datasets to assess model generalizability.
Methods: In this paper, we present our argument regarding the efficiency and applicability of existing deep learning models for COVID-19 diagnosis. We provide results from two popular models - COVID-Net and CoroNet evaluated on three publicly available datasets and an additional institutional dataset collected from EMORY Hospital between January and May 2020, containing patients tested for COVID-19 infection using RT-PCR.
Results: There is a large false positive rate (FPR) for COVID-Net on both ChexPert (55.3%) and MIMIC-CXR (23.4%) dataset. On the EMORY Dataset, COVID-Net has 61.4% sensitivity, 0.54 F1-score and 0.49 precision value. The FPR of the CoroNet model is significantly lower across all the datasets as compared to COVID-Net - EMORY(9.1%), ChexPert (1.3%), ChestX-ray14 (0.02%), MIMIC-CXR (0.06%).
Conclusion: The models reported good to excellent performance on their internal datasets, however we observed from our testing that their performance dramatically worsened on external data. This is likely from several causes including overfitting models due to lack of appropriate control patients and ground truth labels. The fourth institutional dataset was labeled using RT-PCR, which could be positive without radiographic findings and vice versa. Therefore, a fusion model of both clinical and radiographic data may have better performance and generalization
MedShift: identifying shift data for medical dataset curation
To curate a high-quality dataset, identifying data variance between the internal and external sources is a fundamental and crucial step. However, methods to detect shift or variance in data have not been significantly researched. Challenges to this are the lack of effective approaches to learn dense representation of a dataset and difficulties of sharing private data across medical institutions. To overcome the problems, we propose a unified pipeline called MedShift to detect the top-level shift samples and thus facilitate the medical curation. Given an internal dataset A as the base source, we first train anomaly detectors for each class of dataset A to learn internal distributions in an unsupervised way. Second, without exchanging data across sources, we run the trained anomaly detectors on an external dataset B for each class. The data samples with high anomaly scores are identified as shift data. To quantify the shiftness of the external dataset, we cluster B's data into groups class-wise based on the obtained scores. We then train a multi-class classifier on A and measure the shiftness with the classifier's performance variance on B by gradually dropping the group with the largest anomaly score for each class. Additionally, we adapt a dataset quality metric to help inspect the distribution differences for multiple medical sources. We verify the efficacy of MedShift with musculoskeletal radiographs (MURA) and chest X-rays datasets from more than one external source. Experiments show our proposed shift data detection pipeline can be beneficial for medical centers to curate high-quality datasets more efficiently. An interface introduction video to visualize our results is available at this URL.https://youtu.be/V3BF0P1sxQE
MedShift: identifying shift data for medical dataset curation
To curate a high-quality dataset, identifying data variance between the internal and external sources is a fundamental and crucial step. However, methods to detect shift or variance in data have not been significantly researched. Challenges to this are the lack of effective approaches to learn dense representation of a dataset and difficulties of sharing private data across medical institutions. To overcome the problems, we propose a unified pipeline called MedShift to detect the top-level shift samples and thus facilitate the medical curation. Given an internal dataset A as the base source, we first train anomaly detectors for each class of dataset A to learn internal distributions in an unsupervised way. Second, without exchanging data across sources, we run the trained anomaly detectors on an external dataset B for each class. The data samples with high anomaly scores are identified as shift data. To quantify the shiftness of the external dataset, we cluster B's data into groups class-wise based on the obtained scores. We then train a multi-class classifier on A and measure the shiftness with the classifier's performance variance on B by gradually dropping the group with the largest anomaly score for each class. Additionally, we adapt a dataset quality metric to help inspect the distribution differences for multiple medical sources. We verify the efficacy of MedShift with musculoskeletal radiographs (MURA) and chest X-rays datasets from more than one external source. Experiments show our proposed shift data detection pipeline can be beneficial for medical centers to curate high-quality datasets more efficiently. An interface introduction video to visualize our results is available at https://youtu.be/V3BF0P1sxQE
Study of forced degradation behaviour of eprosartan mesylate and development of validated stability indicating assay method by UPLC
The present research work describes comprehensive stress testing of eprosartan mesylate (EM) according to ICH guideline Q1A (R2), and development of a stability-indicating reversed phase ultra performance liquid chromatographic (UPLC) assay. The drug was subjected to acid (0.5N HCl), neutral and alkaline (0.5 N NaOH) hydrolytic conditions at 80 °C, and to oxidative decomposition at room temperature. Photolysis was carried out by exposing the drug during the day time to sunlight (60,000-70,000 lux) for two days and oxidative study was performed with 0.5 mg/ml in 30 % hydrogen peroxide (H2O2 ) at room temperature for 25 hr. The solid drug was also subjected to 50 °C for 30 days in a hot air oven. Degradation of the drug was found to occur under alkaline, acidic and neutral hydrolytic conditions. Separation of the drug and the degradation products was successfully achieved on a BEH (bridged ethylene hybrid) C18 column (1.7 µm, 2.1 mm × 150 mm) with gradient elution of water-acetonitrile as mobile phase. The flow rate and detection wavelength were 0.1 ml/min and 232 nm, respectively. The method was validated and the response was found to be linear in the drug concentration range 5-25 µg/ml (r2 = 0.999). The %RSD in intra-day and inter-day precision studies was < 0.8 %. Recovery of the drug from a mixture of degradation products was between 98.3 and 99.8 %. The LOD and LOQ of developed method were obtained at 0.15 µg/ml and 0.45 µg/ml respectively. The method was specific to the drug, selective to degradation products, and robust. PDA purity test also confirmed the specificity of the method.Colegio de Farmacéuticos de la Provincia de Buenos Aire
OSCARS: An Outlier-Sensitive Content-Based Radiography Retrieval System
Improving the retrieval relevance on noisy datasets is an emerging need for the curation of a large-scale clean dataset in the medical domain. While existing methods can be applied for class-wise retrieval (aka. inter-class), they cannot distinguish the granularity of likeness within the same class (aka. intra-class). The problem is exacerbated on medical external datasets, where noisy samples of the same class are treated equally during training. Our goal is to identify both intra/inter-class similarities for fine-grained retrieval. To achieve this, we propose an Outlier-Sensitive Content-based rAdiologhy Retrieval System (OSCARS), consisting of two steps. First, we train an outlier detector on a clean internal dataset in an unsupervised manner. Then we use the trained detector to generate the anomaly scores on the external dataset, whose distribution will be used to bin intra-class variations. Second, we propose a quadruplet (a, p, nintra, ninter) sampling strategy, where intra-class negatives nintra are sampled from bins of the same class other than the bin anchor a belongs to, while niner are randomly sampled from inter-classes. We suggest a weighted metric learning objective to balance the intra and inter-class feature learning. We experimented on two representative public radiography datasets. Experiments show the effectiveness of our approach
Performance Gaps of Artificial Intelligence Models Screening Mammography -- Towards Fair and Interpretable Models
Even though deep learning models for abnormality classification can perform
well in screening mammography, the demographic and imaging characteristics
associated with increased risk of failure for abnormality classification in
screening mammograms remain unclear. This retrospective study used data from
the Emory BrEast Imaging Dataset (EMBED) including mammograms from 115,931
patients imaged at Emory University Healthcare between 2013 to 2020. Clinical
and imaging data includes Breast Imaging Reporting and Data System (BI-RADS)
assessment, region of interest coordinates for abnormalities, imaging features,
pathologic outcomes, and patient demographics. Deep learning models including
InceptionV3, VGG16, ResNet50V2, and ResNet152V2 were developed to distinguish
between patches of abnormal tissue and randomly selected patches of normal
tissue from the screening mammograms. The distributions of the training,
validation and test sets are 29,144 (55.6%) patches of 10,678 (54.2%) patients,
9,910 (18.9%) patches of 3,609 (18.3%) patients, and 13,390 (25.5%) patches of
5,404 (27.5%) patients. We assessed model performance overall and within
subgroups defined by age, race, pathologic outcome, and imaging characteristics
to evaluate reasons for misclassifications. On the test set, a ResNet152V2
model trained to classify normal versus abnormal tissue patches achieved an
accuracy of 92.6% (95%CI=92.0-93.2%), and area under the receiver operative
characteristics curve 0.975 (95%CI=0.972-0.978). Imaging characteristics
associated with higher misclassifications of images include higher tissue
densities (risk ratio [RR]=1.649; p=.010, BI-RADS density C and RR=2.026;
p=.003, BI-RADS density D), and presence of architectural distortion (RR=1.026;
p<.001). Small but statistically significant differences in performance were
observed by age, race, pathologic outcome, and other imaging features (p<.001).Comment: 21 pages, 4 tables, 5 figures, 2 supplemental table and 1
supplemental figur
- …